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(54) Method and apparatus for addressing voice controlled devices 



(57) Voice controlled devices with speech recogni- 
tion have user assignable appliance names and default 
appliance names to address and control the voice con- 
trolled devices. Methods of control! ng voice controlled 



devices include addressing a voice controlled device by 
name and providing a command. 
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Description 

FIELD OF THE INVENTION 

5 [0001] This invention relates generally to machine interfaces. More particularly, the invention relates to voice user 
interfaces for devices. 

BACKGROUND OF THE INVENTION 

10 [0002] Graphical user interfaces (GUIs) for computers are well known, GUIs provide an intuitive and consistent 
manner for human interaction with computers. Generally, once a person learns how to use a particular GUI, they can 
operate any computer or device which operates using the same or similar GUI. Examples of popular GUIs are MAC OS 
by Apple, and MS Windows by Microsoft, GUIs are now being ported to other devices. For example, the MS Windows 
GUI has been ported from computers to palm tops, personal organizers, and other devices so that there is a common 

15 GUI amongst a number of differing devices. However, as the name implies, GUIs require at least some sort of visual or 
graphical display and an input device such as a keyboard, mouse, touch pad or touch screen. The displays and the 
input devices tend to utilize space in an device, require additional components and increase the costs of an device. 
Thus, it is desirable to eliminate the display and input devices from devices to save costs. 

[0003] Recently, voice user interfaces (VUIs) have been introduced that utilize speech recognition methods to con- 
20 trol a device. However, these prior art VUIs have a number of shortcomings that prohibit them from being universally 
utilized in all devices. Prior art VUIs are usually difficult to use. Prior art VUIs usually require some sort of display device 
such as an LCD, or require a manual input device such as keypads or buttons, or require both a display and a manual 
input device. Additionally, prior art VUIs usually are proprietary and restricted in use to a single make or model of hard- 
ware device, or a single type of software application. They usually are not widely available, unlike computer operating 
25 systems, and accordingly software programmers can not write applications that operate with the VUl in a variety of 
device types. Commands associated with prior art VUIs are usually customized for that single type of device or software 
application. Prior art VUIs usually have additional limitations in supporting multiple users such as how to handle per- 
sonalization and security. Furthermore, prior art VUIs require that a user know of the existence of the device in advance. 
Prior art VUts have not provided ways of determining the presence of devices. Additionally, prior art VUIs usually 
30 require "a user to read instruction manuals or screen displayed commands to become trained in their use. Prior art VUIs 
usually do not include audible methods for a user to learn commands. Furthennriore, a user may be required to learn 
how to use multiple prior art VUIs when utilizing multiple voice controlled devices due to a lack of standardization. 
[0004] Generally, devices controlled by VUIs continue to require some sort of manual control of functions. With 
some manual control required, a manual input device such as a button, keypad or a set of buttons or keypads is pro- 
as vided. To assure proper manual entry, a display device such as an LCD, LED, or other graphics display device may be 
provided. For example, many voice activated telephones require that telephone numbers be stored manually. In this 
case a numeric keypad is usually provided for manual entry. An LCD is usually included to assure proper manual entry 
and to display the status of the device. A speech synthesis or voice feedback system may be absent from these devices. 
The addition of buttons and display devices increases the manufacturing cost of devices. It is desirable to be able to 
40 eliminate all manual input and display from devices in order to decrease costs. Furthermore, it is more convenient to 
remotely control devices without requiring specific buttons or displays. 

[0005] Previously, devices were used by few. Additionally they used near field microphones to listen locally for 
voices. Many prior devices were fixed in some manner or not readily portable or were server based systems. It is desir- 
able to provide voice control capability for portable devices. It is desirable to provide either near field or far field micro- 

45 phone technology in voice controlled devices. It is desirable to provide low cost voice control capability such that it is 
included in more devices. However, these desires raise a problem when multiple users of multiple voice controlled 
devices are in the same area. With multiple users and multiple voice controlled devices within audible range of each 
other, it makes it difficult for voice controlled devices to discern which user to accept commands from and respond to. 
For example, consider the case of voice controlled cell phones where one user in an environment of multiple users 

50 wants to call home. The user issues a voice activated call home command. It more than one voice controlled cell phone 
audibly hears the call home command, multiple voice controlled cell phones may respond and start dialing a home tel- 
ephone number. Previously this was not as significant a problem because there were few voice controlled devices. 
[0006] Some voice controlled devices are speaker dependent Speaker dependency refers to a voice controlled 
device that requires training by a specific user before it may be used with that user. A speaker dependent voice control- 

55 led device listens for tonal qualities in how phrases are spoken. Speaker dependent voice controlled devices do not lend 
themselves to applications where multiple users or speakers are required to use the voice controlled device. This is 
because they fail to efficiently recognize speech from users that they have not been trained by. It is desirable to provide 
speaker independent voice controlled devices with a VUl requiring little or no training in order to recognize speech from 
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any user. 

[0007] In order to achieve high accuracy speech recognition it is important that a voice controlled device avoid 
responding to speech that isnt directed to it That is, voice controlled devices should not respond to background con- 
versation, to noises, or to commands to other voice controlled devices. However, filtering out background sounds must 

5 not be so effective that it also prevents recognition of speech directed to the voice controlled device. Finding the right 
mix of rejection of background sounds and recognition of speech directed to a voice controlled device is particularly 
challenging in speaker-independent systems. In speaker-independent systems, the voice controlled device must be 
able to respond to a wide range of voices, and therefore can not use a highly restrictive fitter for background sounds. In 
contrast, a speaker-dependant system need only listen for a particular person's voice, and thus can employ a more 

10 stringent fitter for background sounds. Despite this advantage in speaker dependant systems, filtering out background 
sounds is still a significant challenge. 

[0008] In some prior art systenns, background conversation has been filtered out by having a user physically press 
a button in order to activate speech recognition. The disadvantage of this approach is that it requires the user to interact 
with the voice controlled device physically, rather than strictly by voice or speech. One of the potential advantages of 

15 voice controlled devices is that they offer the promise of true hands-free operation. Elimination of the need to press a 
button to activate speech recognition would go a long way to making this hands-free objective achievable. 
[0009] Additionally, in locations with a number of people talking, a voice controlled device should disregard all 
speech unless it is directed to it For example, if a person says to another person "I'll call John", the cellphone in his 
pocket should not interpret the "call John" as a command. If there are multiple voice controlled devices in one location, 

20 there should be a way to uniquely identify which voice controlled device a user wishes to control. For example, consider 
a room that may have multiple voice controlled telephones - perhaps a couple of desktop phones, and multiple cell- 
phones - one for each person. If someone were to say "Call 555-1212", each phone may try to place the call unless 
there was a means for them to disregard certain commands. In the case where a voice controlled device is to be con- 
trolled by multiple users, it is desirable for the voice controlled device; to know which user is commanding it For exam- 

25 pie, a voice controlled desktop phone in a house may be used by a husband, wife and child. Each would could have 
their own phonebook of frequently called numbers. When the voice controlled device is told "Call Mother", it needs to 
know which user is issuing the command so that it can call the right person (i.e. should it call the husbands mother, the 
wife's mother, or the child's mother at her work number?). Additionally, a voice controlled devce with multiple users,rnay_ 
need a method to enforce security to protect it from unauthorized use or to protect a user's personalized settings from 

30 unintentional or malicious interactions by others (including snooping, changing, deleting, or adding to the settings). Fur- 
thermore, in a location where there are multiple voice controlled devices, there should be a way to identify the presence 
of voice controlled devices. For example, consider a traveler arriving at a new hotel room. Upon entering the hotel room, 
the traveler would like to know what voice controlled devices may be present and how to control them. It is desirable 
that the identification process be standardized so that all voice controlled devices may be identified in the same way. 

35 [0010] In voice controlled devices, it is desirable to store phrases under voice control. A phrase is defined as a sin- 
gle word, or a group of words treated as a unit. This storing might be to set options or create personalized settings. For 
example, in a voice-controlled telephone, it is desirable to store people's names and phone numbers under voice control 
into a personalized phone book. At a later time, this phone book can be used to call people by speaking their name (e.g. 
"Cellphone call John Smith", or "Cellphone call Mother"). 

40 [0011] Prior art approaches to storing the phrase ("John Smith") operate by storing the phrase in a compressed, 
uncompressed, or transformed manner that attempts to preserve the actual sound. Detection of the phrase in a com- 
mand (i.e. detecting that John is to be called in the example above) then relies on a sound-based comparison between 
the original stored speech sound and the spoken command. Sometimes the stored wavefonm is transfomned into the 
frequency domain and / or is time adjusted to facilitate the match, but in any case the fundamental operation being per- 

45 formed is one that compares the actual sounds. The stored sound representation and comparison for detection suffers 
from a number of disadvantages. If a speaker's voice changes, perhaps due to a cold, stress, fatigue, noisy or distorting 
connection by telephone, or other factors, the comparison typically is not successful and stored phrases are not recog- 
nized. Because the phrase is stored as a sound representation, there is no way to extract a text-based representation 
of the phrase. Additionally, storing a sound representation results in a speaker dependent system. It is unlikely that 

50 another person could speak the same phrase using the same sounds in a command and have it be correctly recog- 
nized. It would not be reliable, for example, for a secretary to store phonebook entries and a manager to make calls 
using those entries. It is desirable to provide a speaker independent storage means. Additionally, if the phrases are 
stored as sound representations, the stored phrases can not be used in another voice controlled device unless the 
same waveform processing algorithms are used by both voice controlled devices. It is desirable to recognize spoken 

55 phrases and store them in a representation such that, once stored, the phrases can be used for speaker independent 
recognition and can be used by multiple voice controlled devices. 

[0012] Presently computers and other devices communicate commands and data to other computers or devices 
using modern, infrared or wireless radio frequency transmission. The transmitted command and/or data are usually of 
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a digital form that only the computer or device may understand. In order for a human user to understand the command 
or data It must be decoded by a computer and then displayed in some sort of format such as a number or ASCII text on 
a display. When the command and/or data are transmitted they are usually encoded in some digital format understood 
by the computer or devices or transmitting equipment. As voice controlled devces become more prevalent, rt will be 
5 desirable for voice controlled devices to communicate with each other using human-like speech in order to avoid pro- 
viding additional circuitry for communication between voice controlled devices. It is further desirable to allow multiple 
voice controlled devices to exchange information machine-to-machine without human user intervention. 

BRIEF SUMMARY OF THE INVENTION 

[001 3] The present invention includes a method, apparatus and system as described in the claims. Briefly, a stand- 
ard voice user interface is provided to control various devices by using standard speech commands. The standard VUl 
provides a set of standard VUl commands and syntax for the interface between a user and the voice controlled device. 
The standard VUl commands include an identification phrase to determine if voice controlled devices are available in 
15 an environment. Other standard VUl commands provide for determining the names of the voice controlled devices and 
altering them. 

[001 4] Voice controlled devices are disclosed. A voice controlled device is defined herein as any device that is con- 
trolled by speech, which is either audible or non-audible. A voice controlled device may also be referred to herein as an 
appliance, a machine, a voice controlled appliance, a voice controlled electronic device, a name activated electronic 
20 device, a speech controlled device, a voice activated electronic appliance, a voice activated appliance, a voice control- 
led electronic device, or a self-identifying voice controlled electronic device. 

[001 5] In order to gain access to the functionality of voice controlled devices, a user communicates to the voice con- 
trolled device one of its associated appliance names after a period of relative silence. The appliance name may be a 
default name or a user-assignable name. The voice controlled device may have a plurality of user-assignable names 
25 associated with it for providing personaRzed functionality to each user. 

[0016] Other aspects of the present invention are described in the detailed description. 

BRIEF DESCRIPTIONS OF THE DRAWINGS 

30 [0017] - — 



FIG. 1 A is an illustration of an environment containing voice controlled devices of the present invention. 

FIG. 1 B is an illustration of remote communications with the voice controlled devices in the environment illustrated 
35 in FIG. 1 A. 

FIG. 2 is an illustration of exemplary voice controlled devices. 

FIG. 3 is a detailed block diagram of the voice controlled device of the present invention. 

40 

FIG. 4 is a detailed block diagram of a voice communication chip. 

FIG. 5 is a block diagram of the standard voice user interface of the present invention. 

45 FIGs. 6A-6C are flow charts of the core command structure for the standard voice user interface of the present 

invention. 

FIGs. 6D-6E are flow charts of the telephone command structure for the standard voice user interface of the 
present invention. 

50 

FIG . 7 is a flow chart of the "Store Name" telephone command structure for the standard voice user interface of the 
present invention. 

FIG. 8 is a flow chart of the "Delete Name' telephone command structure for the standard voice user interface of 
55 the present invention. 

FIGs. 9A-9B are flow charts of the "GETYESNO" function for the standard voice user interface of the present inven- 
tion. 
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FIGs. 1 0A- IOC are flow charts of the "GETRESPONSE" function for the standard voice user interface of the 
present invention. 

FIG. 1 1 is a flow chart of the 'GETRESPONSEPLUS' function for the standard voice user interface of the present 
5 invention. 

FIG. 1 2 is a flow chart of the 'LI STANDS ELECT* function for the standard voice user interface of the present inven- 
tion. 

10 FIG. 1 3 is a block diagram of a pair of voice controlled devices communicating using the standard voice user inter- 
face of the present Invention. 

Like reference numbers and designations in the drawings indicate like elements providing similar functionality. 
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

15 

[0018] In the following detailed description of the present invention, numerous specific details are set forth in order 
to provide a thorough understanding of the present invention. However, it will be obvious to one skilled in the art that the 
present invention may be practiced without these specific details. In other instances well known methods, procedures, 
components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present 
20 invention. 

[001 9] The present invention includes a method, apparatus and system for standard voice user interface and voice 
controlled devices. Briefly, a standard voice user interface is provided to control various devices by using standard 
speech commands. The standard VUl provides a set of core VUl commands and syntax for the interface between a 
user and the voice controlled device. The core VUl commands include an identification phrase to detemriine *rf voice con- 
25 trolled devices are available in an environment Other core VUl commands provide for determining the names of the 
voice controlled devices and altering them. 

[0020] Voice controlled devices are disclosed. A voice controlled device is defined herein as any device that is con- 
trolled by speech, which is either audible or non-audible. Audible and non-audible are defined herein later. A voice con- 
trolled device may also be referred to herein as an appliance, a machine, a voice controlled appliance, a voice controlled 
30 electronic device, a name activated electronic device, a speech controlled device, a voice activated electronic appli- 
ance, a voice activated appliance, a voice controlled electronic device, or a self-identifying voice controlled electronic 
device. 

[0021 ] The present invention is controlled by and communicates using audible and non-audible speech. Speech as 
defined herein for the present invention encompasses a) a signal or information, such that if the signal or information 

35 were passed through a suitable device to convert *tt to variations in air pressure, the signal or information could be heard 
by a human being and would be considered language, and b) a signal or infomnation comprising actual variations in air 
pressure, such that if a human being were to hear the signal, the human would consider it language. Audible speech 
refers to speech that a human can hear unassisted. Non-audible speech refers to any encodings or representations of 
speech that are not included under the definition of audible speech, including that which may be communicated outside 

40 the hearing range of humans and transmission media other than air. The definition of speech includes speech that is 
emitted from a human and emitted from a machine (including machine speech synthesis, playback of previously 
recorded human speech such as prompts, or other forms). 

[0022] Prompts which are communicated by a voice controlled device and phrases which are communicated by a 
user may be in languages or dialects other than English or a combination of multiple languages. A phrase is defined 
45 herein as a single word, or a group of words treated as a unit. A user, as defined herein, is a human or a device, includ- 
ing a voice activated device. Hence "a user's spoken phrase", "a user issuing a command", and all other actions by a 
user include actions by a device and by a human. 

[0023] Voice controlled devices include some type of speech recognition in order to be controlled by speech. 
Speech recognition and voice recognition are used synonomously herein and have the same meaning. Preferably, 

50 speeker independent speech recognition systems are used to provide the speech recognrtion capabilty of the voice 
controlled devices. Speaker independent speech recognitions systems are responsive to speaker-independent repre- 
sentations of speech. In the prefered embodiment, a speaker-independent representation of speech is a phonetk; rep- 
resentation of speech. However, other speaker-independent representations of speech may also be used in 
accordance with the present invention. 

55 [0024] In order to gain access to the full functionality of a voice controlled device with the present invention, a user 
must communicate to the voice controlled device one of its associated appliance names. The appliance name may 
include one or more default names or one or more user-assignable names. A voice controlled device may have a plu- 
rality of user-assignable names associated with it in order to provide personalized functionality to each user. 
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[0025] Additionally, the present invention provides a way to leave a speech recognition engine on throughout ongo- 
ing conversations (including local conversations or those over a telephone link), without having it be falsely triggered by 
background noise or speech that is not directed to it. To accomplish this, the invention nnakes use of a naming scheme 
for voice controlled devices provided by the standard VUl of the present invention. In general, unless a voice controlled 

5 device is addressed by its appliance name, it will disregard all speech. (There are a couple of special exceptions to this 
rule that will be discussed later.) In certain cases the criteria for recognizing a command may be further tightened 
requiring a voice controlled device to be addressed by its user-assigned appliance name, A voice controlled device may 
have multiple users, each of whom assign it a unique appliance name using commands of the standard VUl of the 
present invention. When a voice controlled device is addressed by one of its user-assigned names, the voice controlled 

70 device can determine both that it is being addressed, and which user is addressing it. This allows the voice controlled 
device to use the personalized settings for that particular user. For example, a voice-activated telephone might have 
four different user-assigned names (e.g. Aardvark, Platypus, Socrates, and Zeus), and each user might have a different 
telephone number associated with the phonebook entry for Mother. When the first user says "Aardvark call mother*, the 
first user's mothers called. When the second user says "Platypus Call Mother, the second user's mother is called. The 

15 command "Geronimo call Mother" would not be acted on by this voice controlled device, since Geronimo is not one of 
its appliance names. 

[0026] Another aspect of the present invention improves the recognition accuracy of voice controlled devices. The 
present invention collectively improves recognition accuracy by requiring first a period of relative silence prior to a 
phrase directed at the voice controlled device, second the appliance name, and third a valid command. Complete 

20 silence is not necessary but a relative silence is needed, where relative silence is defined as a sound level that is quieter 
than the sound level while the phrase is being spoken. The specific period of relative silence required, and the allowed 
decibel difference between the relative silence and the sound intensity of the spoken phrase directed at the voice con- 
trolled device, will depend on the type of voice controlled device being built, its intended operating environment, the 
capabilities of the speech recognition system used, and other factors. In some cases, the duration and / or decibel dif- 

25 ference of relative silence required may also be varied by the voice controlled device or associated circuits or software, 
so as to maximize the recognition accuracy obtained in that particular circumstance. In accordance with the standard 
VUl, each user can assign a voice controlled device a unique name or use a default appliance name. After communi- 
cating the appliance name to a voice controlled device, a command must be spoken. Valid input at this point includes 
special phrases like "Help** or "Cancel", which are part of the standard VUl grammar. If a valid command is not recog- 

30 nized, the voice controlled device rejects the entire sequence and returns to the state whefeitiswaitingfoFsilencer 
Additionally, depending on the command, one or more additional phrases, typically representing modifiers to the com- 
mand, may be provided or required (for example, the phone number in the command sequence "(silence) Call 555- 
1212"). Valid phrases at this point also include special phrases like "Help" or "Cancel", which are part of the standard 
VUl grammar. Failure to detect valid phrases after the command within a short period of time can be used as a basis 

35 for rejecting the entire command sequence, or for prompting the user to clarify his intentions. Either way, this serves as 
an additional level of accuracy checking. Alternatively, if a phrase is not detected during the short period of time after 
the command, the command may be performed anyway. 

[0027] Voice controlled devices can be identified either by visual identification, or acoustic identification, or both. 
Acoustic identification is defined as including both audible and non-audible communications with the voice controlled 

40 device. Audible and non-audible are defined elsewhere. Visual identification can occur through use of a standard logo 
or other visual identifier. A blinking LED is another example of a visual identifier. Visual identification is particularly 
appropriate for voice controlled devices that do not have a speech recognition engine that is always turned on. For 
example, to minimize battery consumption, battery operated voce controlled devices may require the user to push a 
switch (or its equivalent, such as flipping open a flip-type cellphone) to activate the speech recognition engine. Acoustic 

45 identification only works for voice controlled devices that are actively listening for recognizable commands. 

[0028] Acoustic identification is accomplished by a user saying an identification phrase. An example of an identifi- 
cation phrase is "What is out there?". A voice controlled device may have one or more identification phrases. Any voice 
controlled device that hears its identification phrase responds to identify its presence. In accordance with the standand 
VUl, the response is a random delay of up to 2 seconds of silence, followed by a standard signal (for example, one or 

50 more tones or beeps or other sounds), then at least one of the voice controlled device's appliance names, and any 
applicable basic operation instructions (e.g. "(beep) I am Telephone. You can say Telephone help."). In order to coordi- 
nate responses from multiple voice controlled devices in the same communication environment, each voice controlled 
device must during its silence period listen for another voice controlled device's response, the start of which is marked 
by the standard signal. Detection of the other voice controlled device's standard signal can be accomplished by any 

55 means that is convenient, including by the voice recognition system, by a DSP, by a microprocessor, or by special cir- 
cuitry. In the event another voice controlled device starts responding during this silence period, the listening voice con- 
trolled device must restart its silence timing after the responding voice controlled device finishes. In the event two voice 
controlled devices start responding at approximately the same time [for example, so that they're standard signals over- 
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lap in time], they both must back off for a new randomly selected silence delay, but this time the delay must be of up to 
twice the length of the previous silence delay, but not to exceed 16 seconds. 

[0029] In order to restrict whch voice controlled devices respond to an identification phrase, a user may include a 
voice controlled device's name in the identification phrase. For example, one could say •Socrates are you out there?" 
5 to see if a voice controlled device named Socrates was nearby. Similarly, one could say "Clock are you out there" which 
would cause all voice controlled devices with an appliance name of Clock (whether a default appliance name or a user 
appliance name) to respond. A possible variation is that voice controlled devices may respond with some response 
other than their names, as for example, might be needed for security reasons. 

[0030] A voice controlled device may use both visual and acoustic identification methods. For example, even 
10 though a speech recognition engine Is continuously on, it may still display the visual logo and / or other visual identifier. 
Similariy in a voice controlled device that requires manual activation of the speech engine, once enabled, the engine 
could then be responsive to the command "What is out there?" 

[0031] In another aspect of the present invention, the initial storage of a user's spoken phrase (for example, when 
making a new phonebook entry under voice control) is processed by the speaker-independent speech recognition 
75 engine of the voice controlled devices. This engine returns a speaker-independent phonetic representation of the 
phrase. This speaker-independent phonetc representation is what is stored. 

[0032] When a command is issued by a user, it is also processed by the speaker-independent speech recognition 
engine of the present invention. This could be the same speaker-independent engine use for storing the original entries, 
or a completely different speaker-independent engine. In either case, the engine returns a speaker-independent pho- 
20 netic representation of the command sequence. This speaker-independent phonetic representation can be compared 
to earlier stored phonetic representations to determine whether the command is recognizable. 
[0033] By converting both the stored spoken entries and any commands to speaker-independent phonetic repre- 
sentation a number of advantages are provided. 

25 o Recognition will be reliable even if the user's voice has changed, perhaps due to a sickness, stress, fatigue, trans- 
mission over a noisy or distorting phone link, or other factors that might change a human user's or machine user's 
speech. Text-based infomriation can be stored and then recognized. 

_ 0 Recognition will be reliable even if some other user had stored the original voice phrase. 
0 Recognition can be speaker-independent, even for user-stored commands and phrases. 

30 » Stored entries originating from text sources and from different speakers can alt be combined and reliably for recog- 
nition. 

0 The use of speaker-independent phonetic representations facilitates upgrading to improved recognition engines as 
they become available. Improved speech recognition engines can use existing stored information without impacting 
reliability or requiring re-storage, since all stored entries are held in phonetic fonm. New information stored using 

35 the improved speech recognition engines can be used on equipment with older recognition engines. Old and new 
generations of equipment can interoperate without prior coordination by using phonetic representations. This 
allows, for example, two PDAs to exchange voice-stored phonebook entries and provide reliable recognition to the 
new users of that infonnation. Finally, there are no legacy restrictions to hold back or restrict future development of 
speaker-independent recognition engines as long as they can create phonetic representations, unlike waveform- 

40 storage based systems, which must always be able to perform exactly the same legacy waveform transfomiations. 

VOICE CONTROLLED DEVICES 

[0034] Referring now to FIG. 1 A, environment 1 00 is illustrated. Environment 1 GO may be any communication envi- 
45 ronment such as an office, a conference room, a hotel room, or any location where voice controlled devices may be 
located. Within environment 100, there are a number of human users 101 A-101H, represented by circles. Also within 
the environment 100, are voice controlled devices 102A-102H, represented by squares and rectangles, each operation- 
ally controlled by the standard voice user interface (VUl) of the present invention. Voice controlled devices 102A-102E, 
represented by rectangles, are fixed within the environment 100. Voice controlled devices 102F-102H, represented by 
50 squares, are mobile voice controlled devices that are associated with human users 1 01 F-1 01 H respectively. Voice con- 
trolled devices 102A-102H may be existing or future devices. Voice controlled devices 102A-102E may be commonly 
associated with a user's automobile, home, office, factory, hotel or other locations where human users may be found. 
Alternatively, if the voice controlled devices 102A-102E are to be controlled by non-audible speech, voice controlled 
devices may be located anywhere. 
55 [0035] In the present invention, the standard VUl allows a user to associate a user-assignable name with these 
voice controlled devices 1 02A-1 02H. The user-assignable name of the voice controlled device may be generic such as 
telephone, clock, or light. Alternatively, the name may be personalized such as those ordinarily given to humans such 
as John, Jim, or George. In either case, the voice controlled devices 102A-102H while constantly listening will not 
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respond to commands until *rt recognizes one of its names (user-assigned or default). Although any name can be 
assigned to a voice controlled device, to minimize confusion between the voice controlled device and real people, users 
may choose to use unusual names such as Aardvark or Socrates, which are unlikely to occur during nomnal conversa- 
tion. With reference to Figure 1 A, consider the environment 100 to be a conference room where human users 101A- 

5 101 H are meeting. Further assume that voice controlled device 102A is a telephone having speaker phone capabilities 
in the conference room 100 and the appliance name is Telephone. The human user such as 101A would first call out 
the name of the Telephone before desiring to give commands to that voice controlled device. By providing names to the 
voice controlled devices, the voice controlled devices can properly respond to given commands and avoid confusion 
between multiple users and voice controlled devices. The voice controlled device may be a telephone, an organizer, a 

10 calculator, a light fixture, a stereo system, a microwave over, a TV set, a washer, a dryer, a heating system, a cooling 
system, or practically any system. Voice controlled devices 102A-102H may include an audible communications inter- 
face (ACI) in order to listen to commands and data input from human users 101A-101H and audibly notify a user that 
the command or data was properly interpreted and executed. Voice controlled devices 102A-102H further include a 
speech recognition and synthesis system (SRS). The speech recognition of the SRS provides for interpreting speech 

75 in different dialects independent of which user is speaking, and independent of whether the user is a human or device. 
While the preferred embodiments of the present invention utilize a speaker independent voice recognition system, the 
present invention is also compatabte with speaker dependent voice recognition systems. The SRS may operate with 
one or more than one language. The speech synthesis of the SRS provides for generation of speech responses, status 
commands, or data by the voice controlled devices which may be audibly communicated or non-audibly communicated. 

20 Speech synthesis, also refered to herein as speech generation, is defined herein to include any method of responding 
with speech (audible or non-audible), including but not limited to, speech recording, storage and playback systems, pre- 
recorded vocabulary systems with playback, sophisticated speech synthesis systems generating utterances from a 
combination of characters, and some combination of the above. Preferably the voice controlled devices contain both a 
speech recording, storage and playback system and a pre-recorded vocabulary system with playback. 

25 [0036] Voice controlled devices 102A-102H may optionally include an communications interface (ECl) for providing 
remote control of voice controlled device via wireless or wired means using non-audible voice or speech. As illustrated 
in FIG. 1A, voice controlled device 102A has a connection 105 for connection to a telephone system; In this manner, 
the voice controlled device 1 02 A may remotely communicate to a user and accept and acknowledge commands. Refer- 
ring now to FIG. 1 B, the human user 101 1 communicates by telephone 112 over the wired or wireless transmission 

30 media 114 over the telephone company switch 11 6. The telephone company switch 116 is connected by a wiFe~means" 
or wireless means through connection 1 05 to the voice controlled device 1 02A. Telephone 1 1 2 may be a wireless or 
wired telephone. In this matter, human user 1 01 1 may remotely interface to a voice controlled device 1 02A within a com- 
munications environment 100. Alternatively, a voice controlled device such as voice controlled device 102E may be 
remotely controlled over a network by a remote computer 1 18. In this case, a remote human user 101J can send voice 

35 commands or instructions through remote computer 118 which is coupled to the voice controlled device 102E through 
the network connection 120 and connection 106. The network connection 120 may be a wireless or wired connection, 
realtime or store-and-forward, through a computer network such as the Internet. There are a wide variety of ways that 
a remote user can be connected to a voice controlled device, including but not limited to, the use of wired and wireless 
connections. Wired connections may include, but are not limited to, realtime communications systems such as the tel- 

40 ephone system and realtime Internet connections, store-and-fonvard systems such as email of voice representations 
and other non-realtime Internet protocols. Wireless systems may include, but are not limited to, radio and infrared sys- 
tems. Any of these alternatives can include circuit-based systems and packet-based systems, and can include analog 
and digital systems. Any of these alternatives can be used with or without various modulation and/or encoding and/or 
encryption schemes. 

45 [0037] Refemng now to Figure 2, exemplary voice controlled devices 1 02I-102M are illustrated. The voice control- 
led device 1021 is exemplary of white goods such as freezers, refrigerators, washers, dryers, air conditioners, heating 
units, microwave ovens, ovens, and stoves. Voice controlled device 102J is exemplary of voice controlled devices 
requiring an optional communication interface (ECl). This may include voice controlled devices for consumer electron- 
ics such as television, video cassette recorders, stereos, camcorders, tape recorders, dictation units, alarm clocks, and 

50 clock radios as well as telephone products such as standard wired telephones, telephone answering machines, light 
switches, alarm systems, computing devices, Internet access devices, and servers, etc. Voice controlled device 102K 
is exemplary of portable or wireless systems such as cellular telephones, walkman style systems, camcorders, and per- 
sonal digital systems. Voice controlled device 102L is exemplary of automobile voice controlled systems such as car 
cellular telephone systems, automobile radio systems, car navigation systems, HAV (heating, air conditioning and ven- 

55 tilation) systems, and other control systems for an automobile. Voice controlled device 102M is exemplary of remote 
controlled devices, such as voicemail systems. 

[0038] Voice controlled device 1 021 includes an audible communications interface (ACI) 202, a speech recognition 
and synthesis system (SRS) 204, and an appliance peripheral and control circuit (APCC) 206. The ACI 202 Is coupled 
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to SRS 204 and SRS 204 is coupled to APCC 206 . In the voice controlled device 1021. ACI 202 is its primary means 
of speech communication. 

[0039] Voice controlled device 102J includes ACI 202, SRS 204, APCC 206. communications interface (ECl) 207. 
and connection 208. ACI 202 is coupled to SRS 204. APCC 206 is coupled to SRS 204. ECl 207 couples to SRS 204 
5 and connection 208 couples to the ECl 207. Voice controlled device 102J can attemativety communicate using speech 
or voice communication signals through ACI 202 or ECl 207, Voice controlled device 102K includes ACI 202. SRS 204, 
APCC 206, and an antenna 209. 

[0040] Voice controlled device 102K can communicate using audible speech signals through the ACI 202 or using 
encoded speech signals through the ECl 207. ECl 207 couples to APCC 206. ECl 207 also couples to Connection 212. 
10 Connection 212 could, for example, be an antenna or infrared port. Voice controlled device 102L also includes an ACI 
202. SRS 204, APCC 206, and an antenna 209. ACI 202 couples to SRS 204. SRS 204 couples to APCC 206. Antenna 
209 couples to APCC 206. Voice controlled device 1 02L can communicate by means of ACI 202 and APCC 206 through 
antenna 209. 

[0041] Voice controlled device 102M includes an APCC 206, SRS 204, an ECl 207, and connection 210. Connec- 
ts tion 210 may be a wired or wireless connection, including an antenna. SRS 204 couples to APCC 206 and also to ECl 
207. Connection 210 couples to ECl 207. Voice controlled device 102M can communicate via ECl 207 over connection 
210. 

[0042] The APCC 206 represents the elements of the voice controlled device 102 that are to be controlled. For 
example, in the case of white goods, the items to be controlled may be temperature, a time setting, a power setting, or 
20 a cycle depending on the application. In the case of consumer electronics, the APCC 206 may consist of those items 
normally associated with buttons, switches, or knobs. In the case of telephone products, the APCC 206 may represent 
the buttons, the dials, the display devices, and the circuitry or radio equipment for making wired or wireless calls. In the 
case of automobile systems, the APCC 206 may represent instrumentation panels, temperature knobs, navigational 
systems, the automobile radios channels, volume, and frequency characteristics. 
25 [0043] Refen'ing now to FIG. 3, the voice controlled device 102 is illustrated. Voice controlled device 1 02. illustrated 
in FIG. 3, is exemplary of the functional blocks within voice controlled devices described herein. Voice controlled device 
1 02 includes the ACI 202, the APCC 206 and the SRS 204. The voice controlled device 1 02 may also have an ECl 207 

such as ECl 207A or ECl 207B. 

[0044] The ACI 202 illustrated in FIG. 3 includes microphone 303, speaker 304, and amplifiers 305. The SRS 204 
30 as illustrated in FIG. 3 includes the voice communication chip 301 , coder/decoder (CODEC) 306 and 308, host micro- 
controller 310, power supply 314, power on reset circuit 316, quartz crystal oscillator circuit 317, memory 318, and 
memory 328. The SRS 204 may optionally include an AC power supply connection 315, an optional keypad 31 1 or an 
optional display 312. For bidirectional communication of audible speech, such as for local commands, prompts and 
data, the speech communication path is through the VCC 301, CODEC 306, and the ACI 202. For bidirectional com- 
as municatton of non-audible speech, such as for remote commands, prompts and data, the non-audible speech commu- 
nication path is through the VCC 301 , CODEC 308, ECl 207A or the VCC 301, host microcontroller 310, APCC 206, 
and ECl 207B. The ECl 207 may provide for a wired or wireless link such as through a telephone network, computer 
network, internet, radio frequency link, or infrared link. 

[0045] Voice communication chip 301 provides the voice controlled device 1 02 with a capability of communication 
40 via speech using the standanj voice user interface of the present Invention. Microphone 303 provides the voice control- 
led device 102 with the capability of listening for audible speech, such as voice commands and the device's appliance 
names. Microphone 303 may be a near field or far field microphone depending upon the application. For example, near 
field microphones may be preferable in portable cell phones where a user's mouth is close while far field microphones 
may be preferable in car cell phones where a user's mouth is a distance away. Speaker 303 allows the voice controlled 
45 device 102 to respond using speech such as for acknowledging receipt of its name or commands. Amplifiers 305 pro- 
vides amplification for the voice or speech signals received by the microphone 303. Additionally, the amplifiers 305 
allow amplification of representations of voice signals from the CODEC 306 out through the speakers 303 such that a 
human user 101 can properly interface to the voce controlled device 102. 

[0046] Microphone 303 and Speaker 304 are each transducers for converting between audible speech and repre- 
50 sentations of speech. CODEC 306 encodes representations of speech from the ACI 202 into an encoded speech signal 
for VCC 301 . In addition. CODEC 306 decodes an encoded speech signal from the VCC 301 into an representation of 
speech for audible communication through the ACI 202. 

[0047] Alternatively, non-audible speech signals may be bi-directionally communicated by the voice controlled 
device 1 02. In this case, VCC 301 provides encoded speech signals to CODEC 308 for decoding. CODEC 308 decodes 
55 the encoder speech signal and provides it to the ECl 207A for communication over the connection 1 05. Speech signals 
may be received over the connection 1 05 and provided to the ECl 207A. The ECl 207A couples the speech signals into 
the CODEC 308 for encoding. CODEC 308 encodes the speech signals into encoded speech signals, which are cou- 
pled into the VCC 301 . 
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[0048] Speech signals may also be electronically communicated through the APCC 206. Speech signals from the 
VCC 301 for transmission are passed to the microcontroller 310. Microcontroller 31 0 couples these into the APCC 206, 
which transmits the speech signals out to the EC! 207B. Speech signals to be received by the voice controlled device 
102 may be received by the ECl 2078 and passed to the APCC 205. The APCC 206 then may couple these received 

5 speech signals to the microcontroller 310, which passes these onto the VCC 301 for recognition. 

[0049] The voice controlled device 102 controls the APCC 206 by means of signals from the host microcontroller 
31 0. The host microcontroller 31 0 is coupled to the APCC 206 to facilitate this control. Voice controlled device 1 02 may 
optionally have a keypad 31 1 coupled to the microcontroller 310 as a further input means. Keypad may be a power but- 
ton, a push to talk button or a security code input means, in addition to optionally being used to input other information. 

10 Voice controlled device 1 02 may optionally include a display 312 coupled to the host microcontroller 31 0 in order to vis- 
ually display its status or other items of Interest to a user. However, the voice controlled device can function generally 
without the optional keypad 31 1 or the optional display 312. 

[0050] The voice controlled device 1 02 includes power supply 314. Power supply 314 may generate power from a 
DC supply source or an AC supply source, or from both. The source of DC supply may be a battery, solar cell, or other 
75 DC source. In the case of an AC supply source, the optional AC power cord 315 is provided. VCA 1 02 includes a power 
on reset circuit 31 6 to reset its system when the power supply 314 is turned on. 

[0051] Quartz crystal oscillator circuit 31 7 in conjunction with other circuitry within the VCC 301 provides an accu- 
rate oscillation input to the VCC 301 for generation of clock signals. 

[0052] Memory 318 is coupled to VCC 301 and provides rewritable non-volatile and volatile memory as well as a 

20 read only memory. These typically are a flash RAM, a static RAM, and a ROM. Memory 31 8 is used to store programs 
as well as store pre-recorded and recorded phrases. Additionally, memory 31 8 provides scratch memory for program 
operation. As is standard practice in the industry, the types of memories used may vary depending on the specific voice 
controlled device being constructed. Program storage for the present invention may be pemianent, as with a ROM, non- 
volatile but changeable, as with a flash, or volatile, as in a RAM, in which case the program could be downloaded from 

25 a n on- volatile memory, or from a remote source. 

[0053] Memory 328 may be volatile memory, non-volatile memory, or a mixture. If only volatile memory is used, its 
contents can be downloaded from another location for initialization. The size and capabilities of Memory 328 wilt 
depend on the type of voice controlled device being built. Alternatively, memory may be substituted in some cases for 
a type of magnetic, optical or other type of storage medium. 

30 [0054] In the voice controlled device 1 02, VCC 301 may additionally include the functionality of the host microcon- 
troller 310 such that only one processing unit is contained within the voice controlled device 102. Similarly, the APCC 
206. codecs 306 and / or 308, ECl 207A, ECl 207B, memory 318, memory 328, amplifiers 305, or other elements 
maybe integrated into VCC 301 , as is customary in the industry as ever-increasing levels of integration are achieved. 
[0055] Refemng now to FIG. 4, a block diagram of the voice communication chip (VCC) 301 is illustrated. The voice 

35 communication chip 301 is an integrated circuit and includes the processing units 402, memory units 403, a Bus and 
Memory Controller (BMC) 404, a bus adapter 405, and Peripherals 406. The voice communication chip 301 is further 
described in the microfiche appendix entitled 'ISD-SR 300, Embedded Speech Recognition Processor* by Information 
Storage Devices, Inc. The processing units 402 includes a microprocessor and a digital signal processing module 
(DSPM). The memory units 403 include a DSPM random access memory (RAM) 407, a system RAM 408, and a read 

40 only memory (ROM) 409. The peripherals 406 include I/O ports 420, an Interrupt Control Unit (ICU) 422, a coder/de- 
coder (CODEC) interface 424, a Pulse Width Modulator (PWM) 426, a MICROWIRE interface 428, Master MICRO- 
WIRE controller 430, a reset and configuration controller 432, a clock generator 434 and a WATCHDOG timer 436. In 
order to communicate effectively, the voice communication chip 301 includes a core bus 41 5 and a peripheral bus inter- 
connecting the components as shown In FIG. 4. 

45 [0056] The microprocessor 416 is a general purpose 16-bit microprocessor core with a RISC architecture. The 
microprocessor 416 is responsible for integer arithmetic logic and program control. The DSP Module (DSPM) 418 per- 
forms DSP arithmetic. ROM 409 and system RAM 408 are used for the storage of programs and data. DSPM RAM 407 
can be accessed directly by the DSPM 418. When the DSPM 418 is idle, the microprocessor 41 6 can access the DSPM 
RAM 407. 

50 [0057] The Bus and Memory Controller (BMC) 404 controls access to off-chip devices, such as DRAM, Expansion 
Memory, off-chip Base Memory and I/O Expansion, The i/0 ports 420 provide the interface to devices coupled to the 
voice communication chip 301 . The I/O ports 420 represents twenty-six I/O pins of the voice communication chip 301 . 
Using the internal ROM 409 for program memory without expansion options, sixteen I/O pins can be individually con- 
figured for input or output, eight I/O pins dedicated for output only and two I/O pins dedicated for 

55 input only. The ICU 422 provides the capability of processing five maskable interrupts (four internal and one external) 
and three internal Non-Maskable tnten*upts (NMIs). The CODEC interface 424 provides a direct interface to one 
CODEC device 306 in the case of ACI 202 only or two CODEC devices 306 and 308 in the case of ACI 202 and ECl 
207A. The Pulse Width Modulator (PWM) 426 generates a square wave with a fixed frequency and a variable duty 
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cycle. The MICROWIRE interface 428 allows serial communication with the host microcontroller 310. The Master 
MICROWIRE controller 430 allows interface to serial flash memory and other peripherals. The reset and configuration 
block 432 controls definition of the environment of the voice communication chip 301 during reset and handles software 
controlled configurations. Some of the functions within the voice communication chip 301 are mutually exclusive. Selec- 

5 tion among the alternatives is made upon reset or via a Module Configuration register. The clock generator 434 inter- 
faces to the quartz crystal oscillator circuit 31 7 to provide clocks for the various blocks of the voice comnujncation chip 
including a real-time timer. TTie clock generator can also be used to reduce power consumption by setting the voice 
communication chip 301 into a powerdown mode and returning it into normal operation mode when necessary. When 
the voice communication chip 301 is in power-down mode, some of its functions are disabled and contents of some reg- 

10 isters are altered. The watchdog timer 436 generates a non-maskable interrupt whenever software loses control of the 
processing units 402 and at the expiration of a time period when the voice communication chip 301 is in a power-down 
mode. 

STANDARD VOICE USER INTERFACE 

15 

[0058] Similar to computer operating systems providing a GUI, the standard voice user interface (VUl) can be 
thought as being provided by a standard VUl operating system code. The standard VUl operating across a wide array 
of voice controlled devices allows a user to interface any one of the voice controlled devices including those a user has 
never previously interacted with. Once a user is familiar with the standard VUl, they can walk up to and immediately 

20 start using any voice controlled device operating with the standard VUl. The standard VUl operating system code has 
specific standardized commands and procedures in which to operate a voice controlled device. These standardized 
commands and procedures are universal to machines executing the standard VU) operating system code. Voice con- 
trolled application software, operating with the standard VUl operating system code, can be written to customize voice 
controlled devices to specific applications. The voice controlled application software has voice commands specific to 

25 the application to which the voice controlled device is used. A particular voice controlled device may also have addi- 
tional special features that extend the core capabilities of the standard VUl. 

[0059] Some of the standard VUl functionality in the core VUl include a way to discover the presence of voice con- 
trolled devices, a core common set of commands for ail voice controlled devfces, a way to learn what commands ( both 
core commands and appliance-specific commands) the voice controlled device will respond to, a vocalized help system 

30 to assist a user without the use of a manual or display, a way to personalize the voice controlled device to a user with 
user assignable settings, security mechanisms to control use of voice controlled devfees to authorized users and pro- 
tect user assignable settings and infomnation from other users, and standard ways for a user to interact with voice con- 
trolled devices for common operations (e.g. selecting yes or no, listing and selecting items from a list of options, 
handling en'ors gracefully, etc.). The standard VUl includes an API (Applications Programming Interface) to allow soft- 

35 ware developers to write custom voice controlled applications that interface and operate with the standard VUl and 
extend the voice controlled command set. 

[0060] Refen^ing now to FIG. 5, a block diagram illustrates the Software 500 for controlling Voice Controlled Device 
1 02 and which provides the standard VUl and other functionality. The Software 500 includes Application Code 510, a 
VUl software module 512 and a Vocabulary 524. Application code 510 may be further modified to support more than 
40 one application, representing multiple application code modules, to provide for further customization of a voice control- 
led device 1 02. The Vocabulary 524 contains the phrases to be detected. The phrases within the Vocabulary are divided 
into groups called Topics, of which there may be one or more. In Figure 5, the Vocabulary 524 consists of two Topics, 
Topic 551 and Topic 552. 

[0061] Typically, Application Code 510 interfaces to the VUl software 612 through the Application Programming 
45 Interface (API) 507. The VUl software 512 provides special services to the Application Code 510 related to voice inter- 
face, including recognition and prompting. The interrelationship between the VUl software 512 and the application code 
510 is analogous to that between Microsoft's MS Windows and Microsoft Word. Microsoft Windows provides special 
services to Microsoft Word related to displaying items on a screen and receiving mouse and keyboard inputs. 
[0062] Generally the Application Code 510 may be stored in host memory and executed by the host microcontroller 
50 310. However, the functionality of the host microcontroller 310 can be embedded into the VCC 301 such that only one 
device or processor and one memory or storage device is needed to execute the code associated with the software 500. 
[0063] All phrases that can be recognized, including those phrases for the core and application specific commands, 
are included in the Vocabulary 524. The VUl software module 512 can directly access the vocabulary phrases, for 
example for use during recognition. The VUl software module 512 can also processes tokens. Tokens abstractly relate 
55 to the phrases within the Topics 551-552. Tokens are integer numbers. For example, the phrase for 'dial' might have a 
token value of '5', and the phrase for 'hang-up* might have a token value of '6'. There is a token value assigned to every 
phrase that can be recognized. Because the VUl software module 512 can process tokens related to the vocabulary file 
524, it can refer to phrases without having to directly access them. This makes it possible to change languages (from 
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English to French, etc.) without nnodifying the VUl software module 502. Thus, the standard VUl will function using dif- 
ferent dialects or languages simply by modifying the vocabulary file 524. 

[0064] Core capabilities of the standard VIU operating in a voice controlled device allow a user to: name the voice 
controlled device, identify the presence of voice controlled devices, activate a user's previously stored personalized 

5 preferences, recover from misrecognitions by canceling an operation, use a Help function to identify the commands and 
options that can be used with the voice controlled device, use a standard core set of commands and use other addi- 
tional commands, confident that they follow a standard syntax. (Although the syntax of commands is common, the spe- 
cific list of commands on any voice controlled device will depend on the nature of the voice controlled device). The 
standard VUl also includes standard functions for the following user interactions for the API: GETYESNO - Accepting a 

10 Yes / No response from the user; GETRESPONSE - Accepting an arbitrary input from the user; GETRESPONSEPLUS 
: Accepting an arbitrary input from the user, with enhanced en-or recovery features; LISTANDSELECT - Providing the 
user with a list of choices, and allowing the user to select one; and ACOUSTICADDWORD - Adding a phrase that can 
thereafter be recognized. 

[0065] In orderly to property function with the standard VUl , the SRS 204 of the voice controlled device 1 02 can pro- 
15 vide continuous recognition of speech and digits when powered up. However, pauses exceeding certain durations may 
be recognized by the SRS 204 as marking the end of a command or providing an indication that an incomplete com- 
mand sequence has been received. 

NAMES 

20 

[0066] A key element of the standard VUl of the present invention is that each voice controlled device has one or 
more appliance names, each of which is a phrase. The initial appliance name is a default name for a voice controlled 
device programmed by the manufacturer at the factory. However, users can generally assign a user-assigned appliance 
name of their choosing to a voice controlled device. Naming a voice controlled device is different from other kinds of 
25 naming, such as naming people. A person has a single (first) name that can be used by everyone who wants to talk with 
them. In contrast, with naming of voice controlled devices, every user of a voice controlled device usually gives the voice 
controlled device a different, unique name. Accordingly, a voice controlled device may have as many names as it has 
users. 

[0067] When a user addresses a voice controlled device by name two things happen. First, when the voice control- 
30 led device recognizes one of its names, the voice controlled device is notified that it is being addressed and will need 
to listen for a command. Second, since each user usually employs a different name for a voice controlled device, it is 
infonned of a user's identity (speaker identification). If a user has stored preferences related to the functionality of the 
voice controlled device, the voice controlled device can personalize itself to the preferences of that user. 
[0068] To illustrate this naming concept, consider the following example of a desktop telephone, the voice controlled 
35 device, having two users. User 1 has named the phone "Aardvark" arid user 2 named the phone "Platypus". If the phone 
hears "Aardvark Call Mom", the phone will recognize that it is being addressed by user 1 and it should use User Vs 
phonebook. Accordingly it will dial the number for "Mom" programmed by User 1 . Similarly, if the phone hears "Platypus 
Call Mom", it will recognize that user 2 is addressing it, and it will dial the number for "Mom" programmed by user 2. 
[0069] In order to minimize false recognition, it is preferable that users assign names to the voice controlled devices 
40 that are generally not spoken during normal speech. Choosing unusual names helps ensure that two voice controlled 
devices within audible range of each other dont have identical names (perhaps assigned by different users). A maxi- 
mum time limit for saying the phrase name may be required in some cases due to memory limitations in the voice con- 
trolled device. 

[0070] Refening now to FIGs. 6A-6E, flow charts of the detailed operation of the standard VUl with voice controlled 
45 devices 102 are described. In the flow charts of FIGs. 6A-6E, a solid box shows a phrase communicated by a user 
(placed in quotes) or a user action (no quotes). A dotted box shows a phrase communicated by the voice controlled 
device (in quotes) or an action taken (no quotes). In the case where there is a solid box directly below a dotted box, a 
path exiting from the right of a dotted box is taken if the action within the current dotted box is completed normally and 
the path to the solid box below a dotted box is taken if an unusual event occurs. Generally, the solid box directly below 
50 the dotted box indicates the unusual event. 

STANDARD VUl COMMAND SYNTAX 

[0071] Refemng now to FIG. 6A, the general syntax for all voice commands is: 

55 

(silencexname) <command) (modifiers & variables). 
The (silence) is a period of relative silence during which the user is not speaking although background noise and back- 
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ground speech may still be present. The (name) is the appliance name associated with a voice controlled device 102. 
The (command) is an operation that a user wants performed. The (modifiers & variables) consist of additional information 
needed by some commands. The SRS 204 recognizes the elements in their syntax in order for a user to control voice 
controlled devices. 

5 [0072] Most voice controlled devices will continuously listen for the voice command sequence. When a voice con- 
trolled device hears its (name), it knows that the following (command) is intended for it. Since each user has a different 
(name) for a voice controlled device, the (name) also uniquely identifies the user, allowing the voice controlled device to 
select that user's personalization settings. Commands include core VUl commands included with al) voice controlled 
devices, and commands specific to a given application, all of which are stored within the vocabulary 524. 

10 [0073] Requiring (silence) before detection of (name) helps prevent false detection of (name) during nonmal conver- 
sational speech (i.e. during periods when the user is speaking conversationally to other users and not to the voice con- 
trolled device). In all cases, the duration of (silence) can be configured by the manufacturer and can range from 0 (no 
(silence) required) to a second or more. Typically it will be about a quarter of a second. 

[0074] Examples of voice command sequences that might be used with a voice controlled device such as a tele- 
15 phone named Aardvark include "Aardvark Call The Office", "Aardvark Dial 1-800-55-1212", and "Aardvark Hang-up". 
(In the command examples and descriptions provided, for the sake of brevity the (silence) is often not shown, and even 
where it is shown or described, the option always exists of a manufacturer choosing to use a silence duration of zero.) 
[0075] There are two special cases where the command syntax is permitted to differ from the general syntax. The 
first special case is in voice controlled devices that do not continuously listen for (silenceXname). For example, in some 
20 battery operated applications, power consumption limitatioris may require the VCC 301 in the voice controlled device 
102 to be powered down during idle periods. Another example is a voice controlled device located where false recog- 
nition of a name would have undesirable results, for example, a desktop phone in a conference room during a presen- 
tation. A third example is voice controlled devices where there is a high risk of false recognition, for example, where 
multiple conversations can be heard. 
P5 [0076] For these types of situations, an alternate command syntax is used in conjunction with a button or switch of 
some type. The first alternate command syntax is: 

(activation of a switch) (silence (optional)) (name) (command) (modifiers & variablesv__ 

30 In this syntax, the (activation of a switch) means the user presses a button or performs some other mechanical act (e.g. 
opening a flip-style cell phone) to activate the recognition capability. 

[0077] A second special case is where the user nomially enters a series of commands in quick succession. For 
these cases, the user can identify themselves once to the voice controlled device using a password protection method, 
or by issuing a command that includes the voice controlled device's appliances (name), and thereafter continue entering 
35 commands. The second alternate command syntax (in this example, for three successive commands) is: 

(silence) (name) (command) (modifiers & variables as needed) 
(Silence) (name (optional)) (command) (modifiers & variables as needed) 
(silence) (name (optional)) (command) (modifiers & variables as needed) 

40 

With this syntax, the user can issue a series of commands without having to constantly repeat the voice controlled 
device's appliances (name). However, the user is permitted to say the (name) at the start of a command. Note that in this 
syntax, the (silence) is required to property recognize the spoken (name) or (command). 

[0078] When either of the first or second alternate syntaxes is used, it is desirable to ensure that If a new user starts 
45 working with the voice controlled device, they are properly identified. This can be ensured by explicitly requiring the 
(name) after a period of inactivity or after power-up of the voice controlled device or other similar protocol. 

STANDARD CORE VUl COMMANDS 

50 [0079] There are a number of standard core commands included in the vocabulary 524 of voice controlled devices 
102 operating using the standard VUl. FIGs. 6A-8 illustrate the syntax of the following commands. 
[0080] Refenring to FIG. 6A, at start 600, the appliance name, (name), of a voice controlled device Is usually spoken 
prior to a command. Any of the voice controlled device's appliances names can be spoken whenever the voice control- 
led device is listening for a command. If the (name) is not followed by a command within some period of time, the voice 

55 controlled device will go back to return to start 600 in its original idle state. This is indicated by the solid box Silence of 
N seconds. N in this case is a programmable value usually application dependent and assigned by the voice controlled 
device manufacturer. After supplying the appliance name, a user is granted access to further commands of the standard 
VUl operating on the voice controlled device at 601. 
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[0081] The syntax of the Help command is: 

<name) Help (command (optional)) 
or 

5 Help (command (optional)) 

The help command can be invoked at any time, including when any other command can be given, or whenever the 
voice controlled device is waiting tor a response. If the Help command is issued while the voice controlled device is wait- 
ing for a valid command, Help must be preceded with (name) if the voice controlled device requires a (name) before other 
10 commands. If the Help command is requested while the voice controlled device is waiting for any other type of 
response, (name) does not need to proceed the Help command. In all cases where (name) is not required before Help, 
if the user says "(name) Help*, the use of (name) does not generate an en-or. 

[0082] The help function is context sensitive - whenever Help is requested, the voice controlled device responds 
with a description of the available options, given the cun-ent context of the voice controlled device. If Help is requested 

15 when the voice controlled device Is listening for a command, the voice controlled device will respond with its state and 
the list the commands that It can respond to (e.g. "At Main menu. You can say ....") Further detail on any specific com- 
mand can be obtained with the "Help (command)' syntax (e.g. "Help Dial", "Help Call", and even "Help Help"). If "Help" 
is requested while the voice controlled device is waiting for some type of non-command response (e.g. "Say the name"), 
then the voice controlled device will respond with a statement of the voice controlled device's current status, followed 

20 by a description of what it is waiting for (e.g. "Waiting for user response. Say the name of the person whose phonebook 
entry you wish to create, or say Nevemnind to cancel."). 
[0083] The syntax of the cancellation command is: 

(name (optional)) Nevermind 
25 or 

(name (optional)) Cancel 

The Nevermind or Cancel command can be Issued whenever the voice controlled device is executing a command and 
waiting for a response from the user. Nevennind or Cancel causes the voice controlled device to cancel the cun-ent 
30 command and respond with a statement that the operation has been cancelled (e.g. "Cancelled."). If Nevennind or Can- 
cel is issued while the voice controlled device is waiting for a command, it can be ignored. The use of (name) with Nev- 
emnind or Cancel is optional - it works identically whether or not (name) is spoken. 
[0084] The syntax of the return to main menu command is: 

35 (name) Main Menu 

For voice controlled devices that have submenus of commands, (name) Main Menu returns the user to the main menu 
and causes a response of "At Main menu." or the like. This command provides an easy way for the user to return to a 
known point from any submenu. The Main Menu command does not have to be recognized in voice controlled devices 
40 that only have one menu, but is a mandatory command for voice controlled devices with submenus. 

Changing Voice Controlled Device Names 

[0085] In some cases it may be desirable to change the user-assigned name of a voice controlled device. Referring 
45 now to FIGs. 6A-6B, the syntax of the Change Name command is: 

(old name) Change Your Name 

This command allows a user to name or rename a voice controlled device. When a voice controlled device is new, it has 
50 at least one default factory programmed appliance name (e.g. Telephone). Most voice controlled devices have the capa- 
bility of supporting one or more user- assignable appliance names. A user can name the appliance name by saying "(fac- 
tory programmed name) Change your name" (e.g. "Telephone change your name "). The voice controlled device will 
then askforthe new name to be repeated and then change its name. This process can be repeated once for each user- 
assignable name. For example, consider a 4-user telephone that can be assigned four user-assignable appliance 
55 names. A user may execute the four name changes with the commands: Telephone change your name " followed by 
the dialog to set the name for user 1 to (for example) Aardvark. "Telephone change your name " followed by the dialog 
to set the name for user 2 to (for example) Barracuda. Telephone change your name" followed by the dialog to set the 
name for user 3 to (for example) Coyote. Telephone change your name " followed by the dialog to set the name for user 
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4 to (for example) Doggone. If the user attempted to change a fifth user-assignable name in sequence with the com- 
mand (Telephone change your name it would result in an error message because alt available user-assignable 
appliance names were assigned. Note that the voice controlled device always responds to the factory programmed 
name, even if all user-assigned names are defined. Accordingly, in this example of a fifth attempt, the voice controlled 
5 device still recognizes the Telephone" factory programmed name - it is just unable to assign a fifth new user-assignable 
appliance name. 

[0086] An existing user-assignable appliance name can also be changed with the 'Change Your Name ' command. 
Continuing the above example, "Aardvark change your name " would alter the appliance's name for the first user (for 
example, it could be changed to Platypus), and leave the other three user names unchanged. Similarly, "Platypus 
10 change your name " followed by a dialog to set the name to Telephone" would reset the first user name to the factory- 
progranrimed default. 

Identification of Voice Controlled Devices 

15 [0087] As voice controlled devices proliferate, it is important that users be capable of readily identifying what, if any, 
voice controlled devices are present when they enter a new environment. For example, a user walks into a hotel room 
that has a number of devices. In order to use them a user needs to know which devices are voice controlled devices. 
Additionally a user needs to know the appliance names in order to properly control them. Beside being audibly identi- 
fied, voice controlled devices can be identified visually as well as by using a logo signifying a voice controlled device 

20 utilizing the standard VUL 

[0088] Acoustic identification works when voice controlled devices are actively listening for recognizable com- 
mands. In most cases, this means the voice controlled device is constantly listening and attempting recognition. Typi- 
cally, these voice controlled devices will be AC powered, since the power drain from continuous recognition will be 
unacceptable for most battery operated voice controlled devices. Referring to FIG. 6A and 6C, the acoustic identifica- 

25 tion is accomplished by a user communicating an identification phrase to command the voice controlled device. The 
identification phrase "What Is Out There?" or some other suitable identification phrase may be used for causing the 
voice controlled devices to identify themselves. 

[0089] The syntax of the standard VUl Identification phrase is: _ 

30 (silence) What Is Out There? 

In response to this query, any voice controlled device that hears the question must respond. The typical voice controlled 
devices response is a random delay of up to 2 seconds of relative silence, followed by a beep (the standard signal) , 
and the response "You can call me (name)", where <name> is the factory-programmed name that can be used to address 
35 the voice controlled device. In the telephony voice controlled device example described above, a response might be 
"(beep) You can call me Telephone." 

[0090] Referring to FIG. 6C, during the random delay of up to 2 seconds, each responding voice controlled device 
listens for another voice controlled device's response (specifically, for another voice controlled device's beep). In the 
event another voice controlled device starts responding (as evidenced by a beep) during this silence period, the listen- 

40 ing voice controlled device must restart its silence timing after the responding voice controlled device finishes. In the 
event two voice controlled devices start responding at the same time (overlapping beeps), they both must back off for 
a new randomly selected silence delay. However, this time the random delay may be greater than the first, up to twice 
the length of the previous silence delay. In any event, the delay should not exceed 16 seconds. Additional backoff peri- 
ods for further conflict resolution is provided if other voice controlled devices respond. 

45 [0091] Referring to FIG. 6 A, the syntax of the Request User-Assignable Names command is: 

(name) Tell Me Your Name 
or 

(name) Tell Me Your Names 

50 

If security permits, any user-programmed (name) or the default (name) can be used. The Request User-Assignable 
Names command is used to ask a voice controlled devce to list alt the user-programmed (names) that it will respond to. 
If security pemriits. the voice controlled device communicates each use-programmed name in a list fashion. Between 
each user-assigned name it pauses for a moment. During this pause a user may communicate a command to the voice 
55 controlled device and it will be executed as if given with that user-programmed (name). For example consider the teleph- 
ony voice controlled device example above. The command "Telephone Tell Me Your Name" provided after a pause will 
cause the telephone to respond by saying "I have been named Aardvark, (pause) Ban^cuda (pause), Coyote (pause), 
and Doggone (pause)." During the pause that followed the voice controlled device saying "Coyote", a user may say "Call 
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Mom", in which case the phone calls user Coyote's Mom (assuming that a phone number for Mom had been previously 
stored by user Coyote). 

SECURITY CONSIDERATIONS 

5 

[0092] The command for Requesting User Assignable names raised the issue of security in the voice controlled 
devices. In some cases it is necessary to limit access to a voice controlled device to authorized users. Various methods 
of security protection can be employed in a voice controlled device which are supported by the standard VUl. 
[0093] The simplest and least secure security protection is provided through the VUl's naming capability. In this 

10 case every user is required to choose a unique name for a voice controlled device. The user assigned appliance names 
are kept confidential within the voice controlled device and only changed or deleted by a user. In this manner the appli- 
ance name can be used to provide basic security. However, there are many shortcomings with this approach. First, the 
user must typically repeat the name before issuing each command, which makes it easy for someone to overhear the 
name, resulting in a loss of security. Second, most voice controlled devices will include a capability for deleting or 

15 changing a user's name for the device. It is preferable to make deletions and changes easy to perfonn. Additionally 
changes may need to be performed by someone other than that particular user. For example, the user may have for- 
gotten the name he originally assigned to the voice controlled device, or the user may have stopped using the device 
and not be available to delete his settings. In the case of using the appliance name as security, there is an inherent con- 
flict between the need for ease of use in changing a name and the quality of security. 

20 [0094] A greater level of security can be achieved by requiring the user to say a secret numeric sequence, pass- 
word or phrase in order to gain access to the voice controlled device. The login might be required when the user starts 
using the voice controlled device after some period of inactivity, or based on some other criteria. A disadvantage of this 
approach is that the spoken numeric sequence or phrase might be overhead. Another security alternative is to require 
the user to enter the numeric sequence, password, or phrase on a keypad such as optional keypad 31 1 . Although this 
.25 introduces additional hardware, it eliminates the risk of a secret code being overheard by another. A variety of other 
security options are also possible, including use of a physical key or a security card (e.g. magnetic stripe or smartcard). 
[0095] Additional security is provided by automatic cancellation or tenmination of user access to the voice controlled 
device. In some cases access may be automatically cancelled after every command execution. In other cases auto- 
matic cancellation of access may occur following some period of inactivity, power-down or reset, completion of some 

30 operation (e.g. in a phone, at the end of a call), or upon the specific request of a user by use of a "Cancel Access" com- 
mand. 

APPLICATION-SPECIFIC COMMANDS 

35 [0096] The standard VUl provides each voice controlled device with a number of application specific commands. 
The application specific commands provided by the standard VUl are associated with telephone and answering 
machine applications. Additional application specific commands can be programmed for and included in the vocabulary 
by a manufacturer. 

[0097] General guidelines for developing commands for the standard VUl are as follows. Sub-menus should be lim- 
40 ited in number and organized around logical groups of commands. For example, a telephone TAD might have a main 
menu that included telephony functions, a submenu for phonebook management, and another submenu for TAD func- 
tions. 

[0098] The number of commands in any menu or submenu should generally be limited to ten or less to minimize 
complexity. The help function should cleariy describe the available commands. 
45 [0099] Complex commands should be broken down into manageably small units. Command phrases should be 
selected that ensure high recognition success. The standard VUl commands have been selected to ensure high recog- 
nition accuracy. Care should be exercised when creating of a custom vocabulary to avoid using conf usable phrases. 
[0100] For destructive events (delete, etc.), user-confirmation of the conrect entry and verifbation of the operation 
should be requested. 

50 

TELEPHONY VOCABULARY 

[0101] Refen-ing now to FIGs. 6D-6E, 7, and 8, flow charts for the telephony vocabulary for the standard VUl are 
illustrated. The telephony vocabulary is particularly for telephony voice controlled devices such as desktop telephones, 
55 cellular telephones, cellular telephone car kits, and cordless phones. The SRS 204 of the present invention is capable 
of recognizing the commands in the telephony vocabulary and converting them into recognized tokens for control of the 
telephony voice controlled devices. The telephony vocabulary includes all the standard VUl Core Commands and the 
following application specific commands. 
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[0102] The syntax of the Call command is: 

(name) Call (voicetag) 
or 

5 (name) Call (digits) 

The Call command is used to dial a specific phone number, expressed either as a series of digits or as a phonebook 
voicetag. The (digits) can be any list of numeric digits. The telephony voice controlled device allows for the synonyms 
"oh" for zero, and "hundred" for zero-zero to be enabled. The sequence of (digits) can contain embedded pauses. How- 

10 ever, if a pause exceeds a programmable duration, the sequence is tenminated and the command executed after rec- 
ognition of a pause that exceeds a duration set by the system designer. The telephony voice controlled device response 
to a Call command should be "Calling (digits)' or "Calling (voicetag)" with the recognized digits or recognized voicetag 
voiced to verify accurate recognition. The "Cancel' command can be used to cancel the calling operation In the event 
of misrecognition. 

75 [0103] The syntax of the Dial command is: 

(name) Dial (voicetag) 
or 

(name) Dial (digits) 

20 

The Dial command is the same as the Call command. 
[0104] The syntax of the Answer command is: 

(name) Answer 

25 

This command is used to answer an incoming call. The response prompt is "Go ahead". 
[0105] The syntax of the Hangup command is: 



(name) Hangup 

30 

This command is used to hangup an active call. The response prompt is a high-pitched beep. 
[0106] The syntax of the Redial command is: 

(name) Redial 

35 

This command is used to redial a number The response is "Redialing (digits)" or "Redialing (voicetag)", depending on 
whether the previous Call or Dial command was to (digits) or a (voicetag). If there was no earlier call made, the response 
is "Nothing to redial". 

[0107] The syntax of the Store command is: 

40 

(name) Store 

The Store command is in the phonebook submenu and Is used to add a new voicetag. 
[0108] The syntax of the Delete command is: 

45 

(name) Delete 

The Delete command is in the phonebook submenu and is used to delete a voicetag. 
[0109] The syntax of the Mute command is: 

50 

(name) Mute 

This command mutes the microphone. The response by the voice controlled device is "Muted". 
[0110] The syntax of the Online command is: 

55 

(name) Online 

This command unmutes the microphone. The response is "Online". 
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[01 1 1 ] Prompts can be communicated by the voice controlled devices to request a response from the user. Prompts 
may be communicated (i.e. prompting) by a speech synthesizer, playback of pre-recorded speech or other means. The 
prompts in the telephone vocabulary include the following context-sensitive help prompts: 



5 


"Calling «iigits> 


'"Please say the name you 


"Online" 




<voicetag>'' 


want to call" 




10 


"Dialing «ligits> 
<voicetag>'* 


*Please start over" 


"one" 


15 


"Go ahead" 


"My name is now <name>'' 


"two" 


"Goodbye" (for the 
hangup command) 


^ v»i 1 0 ^ A. X 1^ ^s^A. y ^ 0 

<voicetag>* 




20 


"Cancelled" 


understand" 


xour 








" five " 


25 


want to delete" 


again" 






"Are you sure you want 


"Name change canceled" 


"six" 


30 


to delete <voicetag>?" 








" <voicetag>deleted" 


"The names did not 
match" 


"seven" 


35 


"Please say the new 


"Please repeat the 


"eight" 



40 



45 



50 
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name" 


number" 




5 


"Please repeat the new 
name" 


**The number for 
<voicetag> is <digits>. 
Is this correct?" 


"nine" 


10 


"Please say the number 
for <voicetag>" 


"The number for 
<voicetag> has been 


■zero" 


15 




stored" 






"That name is not in the 


"Do you want to store it 


"hundred" 




phone book" 


now?" 




20 




** Ml 1 ^ ** 

nuceo, 


"Nothing to 
redial " 


25 






"Star" 








"Flash- 








" Pound" 











30 



[01 1 2] In addition to these pronnpts, the voice controlled devices can generate a number of different tones or beeps. 
These include a medium pitch beep (e.g. 200 millisecond, 500 Hz. sine wave), a low pitched beep (e.g. a buzzer sound 
35 or 250 millisecond, low frequency beep signifying en-oneous entry) and a high pitched beep (e.g. 200 milliseconds, 
1200 Hz. sine wave). Other sounds are possible and would be within the intended scope of the present invention. 

Vocabulary For Telephone Answering Voice Controlled Device 

40 [Oil 3] In addition to the forgoing, application specific commands for the standard VUl enable a user to interface to 
a telephone answering voice controlled device using voice commands. A user can manage message functions and 
obtain remote access from a telephone answering voice controlled device without using a keypad. The following lists 
the additional voice commands to be included in the vocabulary 224 for telephone answering voice controlled device. 



<naine> Play new 


<name> Rewind <n> 


<name> Stop 


<name> Play all 


<name> Record Greeting 


<name> Play Greeting 


<name> Delete this 


<name> Record message 


<name> Room monitor 


<name> Delete all 
messages 


<name> Answer On 


<name> Password 
<password phrase> 


<name> Forward <n> 


<name> Answer Off 
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Automobile Control Vocabulary 

5 [0114] Additional specific commands for the standard VUl enable a user to interface to automobile accessories 
using voice control. Two primary areas for automotive voice control include the control of interior accessories and con- 
trol of entertainment systems. Automotive accessories include environmental controls, windows, door locks, and inte- 
rior lights. It is preferable that "Mission critical' elements in an automobile, such as steering, braking, acceleration, and 
exterior lights not be controlled by voice due to potential safety concerns if misrecognition occurs. Entertainment con- 

10 trots are used primarily for a DC player/changer and for the radio. 

[0115] The automobile control vocabulary 224 for voice controlled devices includes Air conditioning, Fan speed, 
Temperature, Driver window. Passenger window, Left rear window. Right rear window, Windows, Door locks, Wipers. 
Low, Medium, High, Increase, Decrease, Set, Reset, Cancel, Clear, Recall, On, Off, Colder, and Wanner. 

15 STANDARD USER INTERFACE FUNCTIONS FOR THE API 

[0116] The standard VUl of the present invention includes standard functions for user interactions, which are 
accessed by an applications programming interface (API). These standard functions for the API include GETYESNO, 
GETRESPONSE, GETRESPONSEPLUS. and LISTANDSELECT which are used by custom software developers to 

20 develop applications that operate on top of the standard VUl of the present invention. FIGs. 9A-9B, 10A-1 DC, 1 1 , and 
1 2 are flow charts illustrating the functionality of these standard user interface functions within the standard VUl. Briefly, 
the GETYESNO function is for prompting and accepting a positive (Yes) or negative (No) response from a user. The 
GETRESPONSE function is for prompting and accepting an input from a user that conresponds to an expected list of 
responses. The GETRESPONSEPLUS function is for prompting and accepting input from a user similar to the GETRE- 

25 SPONSE function but includes enhanced en-or recovery features. The LISTANDSELECT function provides a user with 
a list of choices and allows the user to select one. The operation of the GETYESNO, GETRESPONSE, GETRESPON- 
SEPLUS, and LISTANDSELECT are adapted from "Debouncing the Speech Button: A Sliding Capture Window Device 
for Synchronizing Turn-Taking" by Bmce E. Balentine et al, International Journal of Speech Technology, 1997. FIG. 9A 
illustrates the use of a Yes/No menu and FIG. 9B illustrates how to resolve a rejection or a bad recognition. FIG. 10A 

30 illustrates the initiation or begin window for the GETRESPONSE and GETRESPONSEPLUS functions. FIG. 10B illus- 
trates the speech startup or open window functionality for the GETRESPONSE and GETRESPONSEPLUS functions. 
FIG. 1 DC illustrates the end recognition or close window functionality for the GETRESPONSE and GETRESPONSEP- 
LUS functions. FIG. 1 1 illustrates the dual capture window functionality for the GETRESPONSEPLUS function. FIG. 12 
illustrates the menu list functionality for the LISTANDSELECT function. 

35 [0117] Refemng to FIGs. 9A-9B, the GETYESNO user interface function is used to ask the user a question and to 
accept a positive or negative response such as "Yes" or "No" (or the equivalent phrases in other languages). The 
parameters associated with the GETYESNO are the QUESTION and a TIMEOUT period. The question parameter is a 
voice prompt to the user which asks a question that can be answered positively or negatively such as "yes" or "no" The 
TimeOut parameter is the number of seconds to wait for a response before flagging that a response was not detected. 

40 The voice controlled device returns a byte value depending upon the response or outcome. A 0 is returned if "No" 
response is detected. A 1 is returned if a "Yes" response was detected. A 1 7 is returned if a response was not detected 
in the allowed time indicating a TimeOut en-or. An 18 is returned if a response was detected, but it was not recognizable 
indicating an out-of -vocabulary- word error. 

[0118] Referring to FIGs. 10A-10C, GETRESPONSE user interface function plays a Prompt to a user that solicits 
45 a response and waits for the response. GETRESPONSE looks for a spoken response that matches a topic within a list 
known as TopicList. GETRESPONSE either returns an array of recognized tokens, or an error indicator The parame- 
ters associated with the GETRESPONSE are Prompt, TimeOut, STS_Sound, and TopicList. The Prompt parameter is 
the initial prompt to be played to the user. The TimeOut parameter is the number of milliseconds to wait for a response 
before flagging that a response was not detected. The STS_Sound parameter (Spoke-Too-Soon Sound) is the sound 
50 or prompt to be played if a user speaks before the Prompt finishes playing. Typically, the STS_Sound will be a short tone 
or beep sound rather than a spoken phrase. The parameter TopicList is the vocabulary subset for the list of topics which 
the SRS 204 should use to identify the spoken response. The voice controlled device returns a pointer to an integer 
array. If the recognition of a response associated with the TopicList was successful, the first element In the an-ay is the 
number of tokens returned and the following elements in the an^y are the tokens for each identified speech element 
55 (one or more words). Element 1 Is n the Number of tokens returned. Elements 2 through n+1 are the Token values for 
each speech element recognized. For example, consider the phrase Telephone Dial Office". If the token value for the 
speech element "Telephone" is 7, for the speech element "Dial" is 12, and for the speech element "Office" is 103, then 
if they are all recognized successfully, the complete array returned would be four elements long with the values 3, 7, 12, 
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103. If the recognition of the response was not successful, the array is two elements long. The first element is set to 
zero and the second element indicates the type of en-or that occunred. In this case, Element 1 is set to 0 indicating that 
an en-or was detected. Element 2 is set to 1 7 indicating that a response was not detected in the allowed time (Timeout 
en-or) or 18 indicating that a response was detected, but it was not recognizable (out-ot-vocabulary-word en-or). The 
5 array returned for a timeout en-or is two elements long with values 0, 1 7 and the anray returned for an out-of-vocabulary- 
word error is two elements long with values 0. 18. 

[0119] Referring to FIG. 11 . GETRESPONSEPLUS user interface function plays a Prompt to a user that solicits a 
response and waits for the response. GETRESPONSEPLUS is similar to GETRESPONSE in that it plays a Prompt for 
the user and then waits for a spoken response. However, GETRESPONSEPLUS includes the capability to play prompts 

10 to recover from error situations where the user has not spoken or has excessive noise in the background. GETRE- 
SPONSEPLUS listens for a spoken response that matches the topics in TopicList GETRESPONSEPLUS either 
returns an array of recognized tokens, or an error indicator. The parameters for GETRESPONSEPLUS are 
InitiaLPrompt, Trmeout, STS_Sound, TopicList, MaxTries, lntervene_Prompt Repeat_Prompt, and the Help_Prompt. 
The InitiaLPrompt parameter is the initial prompt to be played to a user to solicit a response. The TimeOut parameter 

15 is the number of milliseconds to wait for a response before flagging that a response was not detected. The STS_Sound 
prompt is a sound or prompt to be played if user speaks before Prompt finishes playing. Typically, STS_Sound prompt 
will be a short tone or beep sound rather than a spoken phrase. The parameter TopicList is the vocabulary subset for 
the list of topics which the SRS 204 should use to identify the spoken response. The MaxTries parameter is the maxi- 
mum number of times GETRESPONSEPLUS will re-prompt the user in an effort to get a good recognition. If recognition 

20 does not occur after MaxTries, GETRESPONSEPLUS will return and indicate an error The lntervene_Prompt param- 
eter is a prompt played to ask the user to repeat himself (e.g. 'There was too much noise. Please repeat what you 
said.*). This prompt is played when there was too much noise during the previous recognition attempt. The 
Repeat.Prompt parameter is the prompt played to ask the user to repeat what was just said (e.g. 'Please repeat what 
you said"). This prompt is used when a spoke-too-soon error occurred. The Help_Prompt parameter is the prompt 

25 played when the user seems to need further instructions, including when the user says nothing. The voice controlled 
device retums a pointer to an integer array upon completion of the user interface function. If the recognition of a 
response associated with the TopicList was successful, the first element in the array is the number of tokens returned 
and the following elements in the array are the tokens for each identified speech element ( one or more words). Element 
1 is n the Number of tokens returned. Elements 2 through n+1 are the Token values for each speech element recog- 

30 nized. For example, consider the phrase "Telephone Dial Office". If the token value for the speech element Telephone" 
is 7, for the speech element "Dial" is 1 2, and for the speech element "Office" is 1 03, then if they are ail recognized suc- 
cessfully, the complete array returned would be four elements long with the values 3, 7, 12, 103. If recognition was not 
successful, the anay is four elements long. The first element is zero. The second element indteates the most recent type 
of en^or tiiat occurred. The third through fifth elements indicate the number of times each type of error occurred between 

35 when GETRESPONSEPLUS was called to when GETRESPONSEPLUS returned. In this case Element 1 has a value 
of 0 indicating that an error was detected. Element 2 has a value of 1 7 indicating that a response was not detected in 
the allowed time (TimeOut en-or) or 18 indicating that a response was detected, but it was not recognizable (out-of- 
vocabulary-word error) or 19 indicating that a spoke-to-soon error was detected. Element 3 has a value of x indicating 
the number of times a TimeOut enror was detected. Element 4 has a value of y indicating the number of times an out- 

40 of -vocabulary -word enror was detected. Element 5 has a value of z indicating the numtjer of times a spoke-too-soon 
enror was detected. 

[0120] Referring to FIG. 12, LISTANDSELECT user interface function first plays a Prompt. Then it plays each 
prompt in array ListOfMenu Prompts, pausing after each for a PauseTime. During these pauses, the recognizer listens 
for a spoken response that matches the topics in TopicList. LISTANDSELECT either returns an array of recognized 

45 tokens, or an error indicator. The parameters for LISTANDSELECT include InitiaLPrompt, Timeout, STS_Sound, Top- 
icList, ListOfMenuPrompts, PauseTime, and the Help_Prompt. The InitiaLPrompt parameter Is the initial prompt to be 
played to the user. The TimeOut parameter is the number of milliseconds to wait for a response, after playing all the 
prompts in ListOfMenuPrompts, or before flagging that a response was not detected. The STS_Sound parameter is the 
sound or prompt to be played if user speaks before a prompt finishes playing. Typically, STS_Sound will be a short tone 

50 or beep sound rather than a spoken phrase. The parameter TopicList is the vocabulary subset for the list of topics which 
the SRS 204 should use to identify the spoken response. The ListOfMenuPrompts parameter is an array of prompts 
which will be played one at a time. The first element in the an-ay is a count of the number of prompts in ListOfMenu- 
Prompts. The PauseTime parameter is the time to pause after playing each prompt in ListOfMenuPrompts. The Pause- 
Time parameter has a value in milliseconds. The Help_Prompt parameter is the prompt played when the user seems to 

55 need further instructions, including when the user says nothing. The voice controlled device returns a pointer to an inte- 
ger array upon completion of the user interface function. If recognition was successful, the first element in the anay is 
the nunnber of tokens returned, and the following elements in the array are the tokens for each identified speech ele- 
ment (one or more words). Element 1 has a value of n indicating the number of tokens returned. Elements 2 through 
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n+1 have a value of x indicating the token values for each speech element recognized. If recognition was not success- 
ful, the an^y is two elements long. The first element is zero. The second element indicates the type of error that 
occurred In this case, Element 1 has a value of 0 indicating that an error was detected. Element 2 has a value of 1 7 
indicating a response was not detected in the allowed time (Ttmeout error) or 1 8 indicating that a response was 

5 detected, but it was not recognizable (out-of-vocabulary-word error). 

[0121] The ACOUSTICADDWORD function is used by application software to allow a user to add a phrase, also 
called a voicetag, into the voice controlled device. These phrases can later be recognized using the GETRESPONSE 
and GETRESPONSEPLUS functions. The ACOUSTICADDWORD function can be used, for example, in a telephone to 
create dial-by-name entries. By storing a person's name ("John Smith") or identity ("Mother*) or other distinguishing 

10 phrase ("My office number") with ACOUSTICADDWORD. a person could later call the number by saying "Call John 
Smith", "Call Mother", or "Call my office number*. 

[0122] ACOUSTICADDWORD stores the voicetag into a specified TopicUst. In its operation, ACOUSTICADD- 
WORD plays a prompt, receives and records a voicetag, verifies the voicetag, then stores the voicetag. AcousticAdd- 
Word has the ability to recover from errors by re-checking the voicetag more than once. AcousticAddWord checks and 

15 returns an error to the user in the event of duplication. The parameters for ACOUSTICADDWORD include 
InitiaLPrompt, Timeout. STS_Sound, TopicList, MaxTries, Repeat_Prompt, lntervene_Prompt, Error_Prompt, 
Ok_Prompt. and Help_Prompt. The InitiaLPrompt parameter is the initial prompt to be played to a user, such as "Say 
the new name" in the example of storing names in a voice controlled telephone's phonebook. The Timeout parameter 
is the number of milliseconds to wait before flagging a response that a failure was detected. The STS_Sound (Spoke- 

20 Too_Soon Sound) parameter is the sound or prompt to be played if user speaks before the Prompt finishes playing. Typ- 
ically, the STS_Sound will be a short tone or beep sound rather than a spoken phrase. The parameter TopicList is the 
vocabulary subset for which the SRS 204 should store the new voicetag in. The MaxTries parameter is the maximum 
number of times AcousticAddWord will re-prompt the user in an effort to get a good recognition. If recognition does not 
occur after MaxTries. AcousticAddWord will return an en-or indication. The Repeat_Prompt parameter is the prompt 

25 played to ask the user to repeat what was just said (e.g. "Please repeat what you said"). This prompt is used when a 
spoke-too-soon en-or occun-ed. The lntervene_Prompt parameter is a prompt played to ask the user to repeat himself 
(e.g. There was too much noise. Please repeat what you said."). This prompt is played when there was too much noise 
during the previous recognition attempt. Error_Prompt parameter is the prompt played when the repeated name does 
not match the initial name, or if the name is a duplicate (e.g. "Please try again."). The OK_Prompt parameter is the 

30 prompt played when the new name has been successfully recorded and stored (e.g. "(name) is now stored in the 
address book"). The Help_Prompt parameter is the prompt played when the user seems to need further instructions, 
including when the user says nothing. The voice controlled device returns a pointer to an integer anay upon completion 
of the user interface function. If the recognition of a response associated with the AcousticAddWord was successful, the 
array is seven elements long. Element 1 is a value of 1 indicating successful recognition. Element 2 is a value indicating 

35 the token number assigned by the SRS 204, which corresponds to the voicetag that was stored. Element 3 is a pointer 
to a recorded copy of the voicetag. Element 4 Is a value indicating the number of timeout en-ors that occurred. Element 
5 is a value indicating the number of times there was a failure to match the name. Element 6 is a value indicating the 
number of times spoke-too-soon occurred. Element 7 is a value indicating the number of times the help prompt was 
played. If recognition was not successful, the an^y is six elements long. The first element is zero. The second element 

40 indicates the most recent type of error that occun-ed. The third through fifth elements indicate the number of times each 
type of error occun-ed between when AcousticAddWord was called to when AcousticAddWord returned. The sixth ele- 
ment indicates the number of times the help prompt was played. In this case. Element 1 is a value of indicating that an 
error was detected. Element 2 has a value of 1 7 indicating that a response was not detected in the allowed time (Time- 
Out en-or); 18 indicating that a response was detected, but it was not recognizable (Noise en-or); 19 indicating that a 

45 spoke-to-soon en^or was detected; 20 indicating a Recognition failure (no match on repeat); or 21 indicating a Voicetag 
list already full. Element 3 is a value of x indicating the number of times a TimeOut error was detected. Element 4 is a 
value of y indicating the number of times a recognition error was detected. Element 5 is a value of z indicating the 
number of times a spoke-too-soon error was detected. Element 6 is a value indicating the number of times the help 
prompt was played. 

50 

ETIQUETTE FOR VOICE CONTROLLED DEVICES 

[0123] The standard VUl includes an etiquette for voice controlled devices. Generally, voice controlled devices (also 
refen-ed to as machines) should conduct themselves like well-behaved guests. 
55 [0124] However, human factors and human issues involved in living with voice controlled devices are largely unex- 
plored. In designing voice controlled devices, the following suggestions should be considered. 
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Machine Requests to Humans 

[01 25] Machines can ask humans to do things. Any request should be polite. For example, a voice activated cellular 
telephone might ask to be placed in its charger when its batteries are running low. Humans should always have the 
5 option to refuse a machine's request, and the machine should politely accept that, unless the machine considers the 
situation threatening to human life or valuable data, in which case its protests can be more urgent. 

Machines That Use the Telephone On Their Own 

10 [01 26] If a voice controlled device answers the telephone, or places a call to a human user, it should clearly identify 
itself as a machine if there is any risk of it being considered human. 

Recording User Speech 

15 [0127] No machine should record or transcribe a human user's conversations unless those humans present are 
aware that this is occumng. 

Volume Levels 

2o [0128] Machines should modulate their volume levels in response to ambient noise levels, unless specifically over- 
ridden by a human. Machines should be sensitive to when humans want them to be silent (for example, when humans 
are sleeping). Machines shouldn't babble needlessly, and should permit a user barge-in as a means to silence them. 

Machine- to-Machine Communication 

25 

[0129] FIG. 13 is a block diagram of a pair of voice controlled devices 1 02M and 1 02N (each also refen-ed to as a 
machine) communicating, neither, one or both of which could be using the standard voice user interface 500 of the 
present invention in the communication environment 1300. Voice controlled devices can talk to each other to find out 
what other voice controlled devices are present, what kinds of information they understand, and to exchange informa- 

30 tion. For example, a voice controlled TV may ask a voice controlled VCR about necessary settings for it to operate. 
Machine-to-machine communication between voice controlled devices occurs in both audible and non-audible fonmats. 
Essentially, machine-to-machine communication using speech may occur over any speech-compatible media, including 
sound waves through air, conventional telephone links, Internet voice links, radio voice channels, and the like. Machine- 
to-machine communication can occur where none of the machines, some of the machines, or alt of the machines 

35 include the VUl of the present invention. 

[0130] Using the standard VUl, a voice controlled device can locate other voice controlled devices within a commu- 
nications environment in a number of ways. These include overhearing a human interact with another machine, over- 
hearing a machine interact with another machine, explicitly requesting nearby machines to identify themselves by using 
the identification phrase "(silence) What is out there?", explicitly seeking a specific class of machines (e.g. all clocks) by 

40 addressing them by a name category "(silence) Clock are you out there?", or explicitly seeking a specific machine (e.g. 
a clock named Socrates) by addressing it by name "(silence) Socrates are you out there?". 

[0131] In the first two cases, the process of listening to other conversations would reveal the other machines' 
names. In the other three cases the machines within earshot who respond to the "are you out there" command would 
respond with their names. In the last two cases, the "What is out there?" command is restricted to certain classes of 
45 machines and a specific named machine thereby limiting the number of machines that will respond to the command. 
Once the name of the target voice controlled devce is known, the initiating voice controlled device can issue other com- 
mands (e.g. "Socrates what time is it?") to the other. 

[0132] In some cases, a voice controlled device may need to talk to another voice controlled device, one or both of 
which may not adhere to the above protocol. In these cases, the machines can be explicitly programmed to issue the 
50 correct commands and recognize appropriate responses. A simple example of this interaction would be a voice control- 
led device with voice recognition capability and a telephone voice interface dialing a voice-based service such as a spo- 
ken report of the time, and simply capturing the desired data (the time). 

[0133] The preferred embodiments of the present invention for METHOD AND APPARATUS FOR STANDARD 
VOICE USER INTERFACE AND VOICE CONTROLLED DEVICES are thus described. While the preferred embodt- 
55 ments of the present invention utilize a speaker independent voice recognition system, the present invention is also 
compatable with speaker dependent voice recognition systems. While the present invention has been described in par- 
ticular embodiments, the present invention should not be construed as limited by such embodiments, but rather con- 
strued according to the claims that follow below. 
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Claims 

1 . A voice controlled device, comprising: 

5 a processor; 

a processor readable storage medium; 

code recorded in the processor readable storage medium to store at least one user assignable appliance name 
in the processor readable storage medium; 

code recorded in the processor readable storage medium to recognize the at least one user assignable appli- 
10 ance name; 

code recorded in the processor readable storage medium to recognize a command; and 

code recorded in the processor readable storage medium to control the voice controlled device in response to 

recognizing the user assignable appliance name and the command. 

15 2. The voice controlled device of claim 1 , wherein, 

the user assignable appliance name and the command are provided using audible speech. 

3. The voice controlled device of claim 1 , wherein, 

20 

the user assignable appliance name and the command are provided using non-audible speech. 

4. The voice controlled device of claim 1 further comprising: 

25 code recorded in the processor readable storage medium to store personal preferences of the voice controlled 

device associated with the at least one user assignable appliance name; and 

code recorded in the processor readable storage medium to personalize the voice controlled device to the 
stored personal preferences associated with the at least one user assignable appliance name upon recognition 
of the at least one user assignable appiicance name. 

30 

5. The voice controlled device of claim 1 further comprising: 

code recorded in the processor readable storage medium to store a default appliance name associated with 
the voice controlled device; 

35 code recorded in the processor readable storage medium to recognize the default appliance name associated 

with the voice controlled device; and 
wherein, 

code recorded in the processor readable storage medium to control the voice controlled electronic device is 
further responsive to recognizing the default appliance name and the command. 

40 

6. The voice controlled device of claim 5, wherein, 

the default appliance name associated with the voice controlled device is factory assignable. 
45 7. The voice controlled device of claim 5, wherein, 

the default appliance name associated with the voice controlled device is factory and user assignable. 

8. A method of controlling a voice controlled device, the method comprising: 

50 

providing a voice controlled device having a speech recognition system for recognizing speech; 
storing at least one user assignable appliance name into the voice controlled device; 
communicating a communicated appliance name and a command to the voice controlled device; and 
controlling the voice controlled device if the communicated appliance name is recognized as matching the at 
55 least one user-assignable appliance name and the command is recognized by the voice controlled device. 

9. The method of claim 8 for activating a voice controlled device, wherein, 
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the communicated appliance name and the command are communicated using audible speech. 

10. The method of claim 8 for activating a voice controlled device, wherein, 

5 the communicated appliance name and the command are communcated using non-audible speech. 

11. A method of controlling a voice controlled device, the method comprising: 

providing a voice controlled device having a speech recognition system for recognizing speech; 
10 storing a default appliance name into the voice controlled device; 

communicating a communicated name and a command to the voice controlled device; and 
controlling the voice controlled device if the communicated name is recognized as matching the default appli- 
ance name and the command is recognized by the voice controlled device. 

75 12. The method of claim 1 1 for activating a voice controlled device, wherein, 

the communicated appliance name and the command are communicated using audible speech. 

13. The method of claim 11 for activating a voice controlled device, wherein, 

20 

the communicated appliance name and the command are communicated using non-audible speech. 

14. A method for activating a voice controlled device, the method comprising: 

25 providing a voice controlled device having a speech recognition system for recognizing speech; 

storing a default appliance name into the voice controlled device; 
storing at least one user assignable appliance name into the voice controlled device; 

communicating a communicated name and a command to the voice controlled device; and 

controlling the voice controlled device if the communicated name is recognized as matching the at least one 
30 user assignable appliance name or the default appliance name and the command is recognized by the voice 

controlled device. 

15. A method of assigning a new name to a voice controlled device, the method comprising: 

35 providing a voice controlled device having a speech recognition system for recognizing speech; 

activating the voice controlled device; and 

communicating a new name to the voice controlled device at least once. 

16. The method of claim 1 5 for assigning a new name to a voice controlled device, wherein, 

40 

the voice controlled device is activated by communicating a cun-ent appliance name and a change name com- 
mand. 

17. The method of claim IS for assigning a new name to a voice controlled device, wherein, 

45 

the new name is communicated using audible speech. 

18. The method of claim 15 for assigning a new name to a voice controlled device, wherein, 
50 the new name is communicated using non-audible speech. 

19. The method of claim 15 for assigning a new name to a voice controlled device, wherein: 

the voice controlled device includes prompting capability and the voice controlled device communicates audi- 
55 ble prompts to a user in order to request communication from the user of the new name. 

20. The method of claim 15 for assigning a new name to a voice controlled device, wherein: 



25 



BNSOOCID: <EP 1054390A2J_> 



EP 1 054 390 A2 



the voice controlled device includes prompting capability and the voice controlled device communicates non- 
audible prompts to another voice controlled device in order to request communication from the device of the 
new name. 

5 21. A first voice controlled device capable of operating in a communication environment with at least one other voice 
controlled device, the first voice controlled device comprising: 

a processor; 

a processor readable storage medium; 
10 code recorded in the processor readable storage medium to store a plurality of user assignable appliance 

names in the processor readable storage medium for activating the voice controlled device; 
code recorded in the processor readable storage medium to recognize the plurality of user assignable appli- 
ance names associated with the one voice controlled device; 

code recorded in the processor readable storage medium to recognize a command; and 
15 code recorded in the processor readable storage medium to control the voice controlled electronic device in 

response to recognizing one of the plural'ity of user assignable appliance names and the command. 

22. The first voice controlled device of claim 21 capable of operating in a communication environment with at least one 
other voice controlled device, wherein, 

20 

the user assignable appliance names and the command are provided using audible speech. 

23. The first voice controlled device of claim 21 capable of operating in a communication environment with at least one 
other voice controlled device, wherein, 

25 

the user assignable appliance names and the command are provided using non-audible speech. 

24. The first voice controlled device of claim 21 capable of operating in a communication environment with at least one 
other voice controlled device, the first voice controlled device further comprising: 

30 

code recorded in the processor readable storage medium to store personal preferences of the voice controlled 
device associated with the at least one user assignable appliance name; and 

code recorded in the processor readable storage medium to personalize the voice controlled device to the 
stored personal preferences associated with the at least one user assignable appliance name upon recognition 
35 of the at least one user assignable applicance name. 

25. The first voice controlled device of claim 21 capable of operating in a communication environment with at least one 
other voice controlled device, the first voice controlled device further comprising: 

40 code recorded in the processor readable storage medium to store a default appliance name associated with 

the voice controlled device; 

code recorded in the processor readable storage medium to recognize the default appliance name associated 
with the voice controlled device; and wherein, 

code recorded in the processor readable storage medium to control the voice controlled electronic device is 
45 further responsive to recognizing the default appliance name and the command. 

26. The first voice controlled device of claim 26 capable of operating in a communication environment with at least one 
other voice controlled device, wherein, 

50 the default appliance name associated with each of the voice controlled devices is factory assignable. 

27. The first voice controlled device of claim 25 capable of operating in a communication environment with at least one 
other voice controlled device, wherein 

55 the default appliance name associated with each of the voice controlled devices is factory and user assignable. 

28. The first voice controlled device of claim 21 capable of operating in a communication environment with at least one 
other voice controlled device, the first voice controlled device further comprising: 
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a security means to protect each voice controlled device from unauthorized use. 

29. The first voice controlled device of claim 24 capable of operating in a communication environment with at least one 
other voice controlled device, the first voice controlled device further comprising: 

5 

a security means to protect each voice controlled device from unauthorized use. 

30. The first voice controlled device of claim 27 capable of operating in a communication environment with at least one 
other voice controlled device, the first voice controlled device further comprising: 

10 

a security means to protect each voice controlled device.from unauthorized use. 
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